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Abstract 

We investigate the algorithmic problems of the homophyly phenomenon in networks. 
Given an undirected graph G — (V, E) and a vertex coloring c: V — > {1, 2, • • • , k} of G, 
(N. we say that a vertex v 6 V is happy if v shares the same color with all its neighbors, and 

unhappy, otherwise, and that an edge e E E is happy, if its two endpoints have the same 
color, and unhappy, otherwise. Supposing c is a partial vertex coloring of G, we define the 
Maximum Happy Vertices problem (MHV, for short) as to color all the remaining vertices 
such that the number of happy vertices is maximized, and the Maximum Happy Edges 
problem (MHE, for short) as to color all the remaining vertices such that the number of 
CNJ . happy edges is maximized. 

Let k be the number of colors allowed in the problems. We show that both MHV and 
MHE can be solved in polynomial time if k = 2, and that both MHV and MHE are NP-hard 
if k > 3. We devise a max{l/fc, f2(A -3 )}-approximation algorithm for the MHV problem, 
where A is the maximum degree of vertices in the input graph, and a 1/2-approximation 
algorithm for the MHE problem. This is the first theoretical progress of these two natural 
and fundamental new problems. 
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^ ■ 1 Introduction 

Networks or at least social networks heavily depend on human or social behaviors. It is believed 
that homophyly [H Chapter 4] is one of the most basic notions governing the structure of social 
networks. It is a common sense principle that people are more likely to connect with people 
' they like, as what says in the proverb "birds of a feather flock together" . 

Li and Peng in [144 [To] gave a mathematical definition of community, and small community 
phenomenon of networks, and showed that networks from some classic models do satisfy the 
small community phenomenon. A. Li and J. Li et al. [12] proposed a homophyly model by 
introducing a color for every vertex in the classical preferential attachment model such that 
networks generated from this model satisfy simultaneously the following properties: 1) power 
law degree distribution, 2) small diameter property, 3) vertices of the same color naturally form 
a small community, and 4) almost all vertices are contained in some small communities, i.e., the 
small community phenomenon of networks. This result implies the homophyly law of networks 
that the mechanism of the small community phenomenon is homophyly, and that vertices within 
a small community share remarkable common features. 
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A. Li and J. Li et al. [T3] showed that many real networks satisfy exactly the homophyly 
law, in which an interesting application is the prediction and confirmation of keywords from a 
paper citation network of high energy physics theor}0. The network contains 27, 770 vertices 
(i.e., papers) and 352,807 edges (i.e., citations). All the papers have titles and abstracts, but 
only 1, 214 papers have keywords listed by their authors. We interpret the keywords of a paper 
to be a function of the paper. By the homophyly law, vertices within a small community of 
the network must share remarkable common features (keywords here). The prediction is as 
follows: 1) to find a small community from each vertex, if any, 2) to extract the most popular 5 
keywords from the known keywords in a community, as the remarkable common features of this 
community, 3) to predict that (all or part of) the 5 remarkable common keywords are keywords 
of a paper in the community, 4) to confirm a prediction of keyword K for a paper P, if K 
appears in either the title or the abstract of paper P. It is a surprising result that this simple 
prediction confirms keywords for 19, 200 papers in the network. This experiment implies that 
real networks do satisfy the homophyly law, and that the homophyly law is the principle for 
prediction in networks. 

The keywords can be viewed as the attributes of vertices in a network. The above ex- 
perimental result suggests a natural theoretical problem that, given a network in which some 
vertices have their attributes unfixed, how to assign attributes to these vertices such that the 
resulting network reflects the homophyly law in the most degree? Some attributes of a vertex 
cannot be changed, such as nationality, sex, color and language, but some other attributes can 
be changed, such as interest, job, income and working place. For simplicity, we consider the 
case that each vertex contains only one alterable attribute, i.e., the network is a 1-dimensional 
network. Consider the following scenario. Suppose in a company there are many employees 
which constitutes a friendship network. Some employees have been assigned to work in some 
departments of the company, while the remaining employees are waiting to be assigned. An 
employee is happy, if s/he works in the same department with all of (or p fraction of for some 
p £ (0, 1], or at least q for some integer q > 0) her/his friends; otherwise s/he is unhappy. Sim- 
ilarly, a friendship is happy (or lucky) if the two related friends work in the same department; 
otherwise the friendship is unhappy. Our goal is to achieve the greatest social benefits, that is, 
to maximize the number of happy vertices (similarly, happy edges) in the network. 

We can easily express the above problems as graph coloring problems, just identifying each 
attribute value with a different color. Hence we get two specific graph coloring problems, as 
defined below. 

Definition 1.1 (The MHV problem) (Instance) In the Maximum Happy Vertices (MHV) 
problem, we are given an undirected graph G = (V,E), a color set C = {1,2,- •• ,k}, and a 
partial vertex coloring function c: V — > C . We say that c is a partial function in the sense that 
c assigns colors to part of vertices in V . 

(Query) A vertex is happy if it shares the same color with all its neighbors, otherwise it is 
unhappy. The task is to extend c to a total function c' such that the number of happy vertices 
is maximized. 

Definition 1.2 (The MHE problem) (Instance) The input of the Maximum Happy Edges 
(MHE) problem is the same as that of the MHV problem. 

(Query) An edge is happy if its two endpoints have the same color, otherwise it is unhappy. 
The goal is to extend c to a total function c' such that the number of happy edges is maximized. 

The vertex coloring defined by the total function c' : V — > C in MHV and MHE is called 
a total vertex coloring. In general, a (partial or total) vertex coloring can be denoted by 

1 http : //snap . Stanford. edu/data/ cit-HepTh.html. 
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(Vi,V2,--- , Vfc), where is the set of all vertices having color i. A total vertex coloring is 
a partition of V(G), while a partial vertex coloring may not. Therefore, the MHV and MHE 
problems are two extension problems from a partial vertex coloring to a total vertex coloring. 
We remark that the coloring for our case is completely different from the well-known Graph 
Coloring problem, which requires that the two endpoints of an edge must be colored differently 
and asks to color a graph in such a way by using the minimized number of colors. We use the 
notion of color just for intuition. 

If in the MHV problem the color number A; is a constant, the problem is denoted by fe-MHV. 
For the specific values of k, we have the 2-MHV problem, the 3-MHV problem, and so on. Note 
that in the original MHV problem k is given as a part of the input. Similarly, we have the 
A;-MHE problem for constant k, with 2-MHE, 3-MHE, etc. being its specific problems. 

We remark that both the MHV and MHE problems are natural and fundamental algorith- 
mic problems, and that they have not appeared yet in literature. The reasons could be two 
folds. On the one hand, we ask the questions from our network applications which did not 
happen before; on the other hand, the meaning of coloring has been specified previously so that 
the two endpoints of an edge must have different colors. We notice that the current version of 
our problems may not really help network applications much because of their simplicity. For 
real network applications, probably the experimental method |13] introduced at the beginning 
of this section is fine enough. However, this has no theoretical guarantee, owing to different 
structures of networks. Our problems seem essentially new and fundamental algorithmic prob- 
lems. Theoretical analysis of the problems are always helpful to understand the nature of the 
problems, and hence are very welcome. 

1.1 Our Results 

We investigate algorithms to solve the MHV and MHE problems. It is easy to see that the 
partial function c plays an important role in the MHV and MHE problems. If none of the 
vertices in the input graph has a pre-specified color, then the MHV and MHE problems are 
trivial. The optimal solution just assigns one arbitrary color to all the vertices. This will make 
all vertices and all edges happy. 

We prove that the MHV and MHE problems are NP-hard. Interestingly, the complexity of 
fc-MHV and fc-MHE dramatically changes when k changes from 2 to 3. Specifically, we prove 
that both 2-MHV and 2-MHE can be solved in polynomial time, while both fc-MHV and A;-MHE 
are actually NP-hard for any constant k > 3. We thus seek approximation algorithms for the 
MHV and MHE problems, and their variants k-MRY and fe-MHE (k > 3). 

We design two approximation algorithms Greedy-MHV (Subsection l2.2.ip and Growth- 
MHV (Subsection 12.2.20 for the MHV problem and its variant A;-MHV. Algorithm Greedy- 
MHV is a simple greedy algorithm with approximation ratio 1/k. Algorithm Growth-MHV is 
an algorithm based on the subset-growth technique with approximation ratio 0(A~ 3 ), where A 
is the maximum degree of vertices in the input graph. In real networks, A is usually poly log n, 
implying that the ratio f2(A -3 ) is reasonable. As Algorithm Growth-MHV is executing, more 
and more vertices are colored. According to the current vertex coloring for the input graph, 
we define several types for the vertices. (Note that the types here are not colors.) Algorithm 
Growth-MHV works based on carefully classifying all the vertices into several types. 

We can extend our algorithms for MHV to deal with two more natural variants SoftMHV 
and HardMHV. In the SoftMHV problem, a vertex v is happy if v shares the same color with at 
least pdeg(v) neighbors, where p (that is, the soft threshold) is a number in (0, 1] and deg(i>) is 
the degree of vertex v. In the HardMHV problem, a vertex v is happy if v shares the same color 
with at least q neighbors, where q (that is, the hard threshold) is an integer. We show that the 
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SoftMHV and HardMHV problems can also be approximated within max{l//c, 0(A -3 )}. The 
approximation algorithms for SoftMHV and HardMHV, given in the Appendix for completeness, 
are similar to that for MHV. 

For the MHE problem and its variant fc-MHE, we devise a simple approximation algorithm 
based on a division strategy, namely, Algorithm Division-MHE (Section[3|). The approximation 
ratio is proved to be 1/2. 

1.2 Related Work and Relation to Other Problems 

The MHV and MHE problems are two quiet natural vertex classification problems arising from 
the homophyly phenomenon in networks. Classification is a fundamental problem and has wide 
applications in statistics, pattern recognition, machine learning, and many other fields. Given a 
set of objects to be classified and a set of colors, a classification problem can be depicted as from 
a very high level assigning a color to each object in a way that is consistent with some observed 
data or structure that we have about the problem [DUI]. In our problems, the observed strucute 
is homophyly. Since the MHV and MHE problems are essentially new, in the following we just 
show some closely related problems and results. 

Thomas Schelling [17\ I18|. the Nobel economics prize winner, showed by experiments how 
global patterns of spatial segregation arise from the effect of homophyly operating at the local 
level. The experiments in |17| are given in one-dimensional and two-dimensional geometric 
models. From a more general viewpoint of graph theory, Schelling's experiments, although given 
in geometric models, can be viewed as how to remove and add edges from/to a graph whose 
vertices are all colored by some colors such that the resulting graph possesses the homophyly 
property. In contrast, the MHV and MHE problems are how to color the vertices in a given 
graph whose part of vertices are already colored such that the resulting graph possesses the 
homophyly property. 

The Multiway Cut problem El El [10] should be the traditional optimization problem that 
is most related to MHV and MHE. Given an undirected graph G = (V, E) with costs defined 
on edges and a terminal set S C V, the Multiway Cut problem asks for a set of edges (called a 
multiway cut, or simply a cut) with the minimum total cost such that its removal from graph 
G separates all terminals in S from one another. The Multiway Cut problem in general graphs 
is NP-hard even the terminal set contains only three terminals and each edge has a unit cost 
[3]. The current best approximation ratio known for this problem is 1.3438 |10j . 

Removing a minimum multiway cut from a graph breaks the graph into several components 
such that each component contains exactly one terminal. From the viewpoint of graph coloring, 
this is equivalent to coloring the uncolored vertices in a graph in which each terminal has a 
distinct pre-specified color, such that the number of happy edges is maximized. Therefore, the 
MHE problem is actually the dual of the Multiway Cut problem. See Figure [1] for an example. 
(More precisely, the dual of Multiway Cut is only a special case of MHE, since in MHE there 
may be more than one vertices having the same pre-specified color.) However, Multiway Cut 
and MHE are quite different in terms of approximation, since one is a maximization problem 
while the other is a minimization problem. 

For a vertex subset V C V(G) of graph G, we define the border of V to be the set of 
vertices in V' that has a neighbor not in V'. Given a vertex coloring (Vi, V2, ■ ■ ■ ,Vk) of graph 
G, the vertices in the border of each Vi are obviously unhappy. The MHV problem, which finds 
a vertex coloring that maximizes the number of happy vertices, is actually equivalent to finding 
a vertex coloring (Vi, V2, • • • ,Vk) for a graph in which some vertices are already colored, such 
that the total number of vertices in borders of all V^'s is minimized. Please refer to Figure [1] for 
an example. The latter problem we just introduce is a new minimization problem; the MHV 
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Figure 1: An instance of Multiway Cut and the induced vertex coloring. The square vertices 
are terminals and have pre-specified colors, while the round vertices are non-terminal vertices. 
The hollow vertices are border vertices. 

problem and this new problem are dual to each other. 

From the above analysis, one can see that the partial function c in the MHE problem (and 
the MHV problem), which assigns colors to part of vertices of the input graph, actually simulates 
and generalizes the terminal set part in the Multiway Cut problem. 

Kann and Khanna et al. [9] studied the Max /c-Cut problem [7] and its dual, that is, the Min 
/c-Partition problem [9]. Given an undirected graph G = (V,E), the Min /c-Partition problem 
asks to find a vertex coloring c: V — > {1, 2, • • ■ ,k} such that the number of edges whose two 
endpoints have the same color (i.e., the happy edges in our setting) is minimized. 

According to the way of definitions in [9], we can define the dual of the Min /c-Cut problem 
[16j as follows: Given an undirected graph G = (V, E) and an integer k > 0, finding an edge 
subset whose removal breaks graph G into exactly k components, such that the number of 
remaining edges is maximized. Let's call this problem the Max /c-Partition problem. In other 
words, Max /c-Partition asks for a total vertex coloring c' : V — > {1,2,- •• , k} such that the 
number of happy edges is maximized, where c' should be a surjective function (that is, for each 
color i there exists a vertex whose color is i). 

The Max /c-Partition problem defined as above is close to the MHE problem, but they are 
still different in the obvious way: In Max /c-Partition there is no any vertex having a pre-specified 
color and the required vertex coloring function d must be surjective, while in MHE there must 
be some vertices having the pre-specified colors and the required vertex coloring function c' may 
not be surjective. 

Notations. Let G = (V,E) be a graph. Let n = \V\ and m = \E\. Suppose v G V is 
a vertex. Denote by N(v) the set of neighbors of v. As usual, deg(v) means the degree of v, 
i.e, deg(f) = |JV(tj) | . Denote by N 2 (v) the set of neighbors of neighbors of v (not including v 
itself), i.e., the vertices within distance 2 of v (assume each edge has unit distance). 

Given a vertex coloring c, for a (colored or uncolored) vertex v, define N u (v) as the set of 
vertices in N(v) that has not yet been colored. For a colored vertex v, define N s (v) as the set 
of vertices in N(v) having the same color as c(v), N d (v) as the set of vertices in N(v) having 
colors different to c(v). 

Given an instance X of some optimization problem V, we use OPT(I) (OPT for short) to 
denote the optimum (that is, the value of an optimal solution) of the instance. Let A be an 
algorithm for problem V . We use SOL(T) (SOL for short) to denote the value of the solution 
found by algorithm A on instance X of problem V . In addition, OPT and SOL also denote the 
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corresponding solutions, abusing notations slightly. 

Organization of the paper. The remaining of the paper is organized as follows. In 
Section [21 we show that 2-MHV is polynomial-time solvable, and give the greedy approximation 
algorithm and the subset-growth approximation algorithm for the MHV and &:-MHV (k > 3) 
problems. In Section [3j we show that 2-MHE is polynomial-time solvable, and give the division- 
strategy based approximation algorithm for the MHE fc-MHE (k > 3) problems. In Section HI 
we prove the NP-hardness for the MHE, fc-MHE (k > 3), MHV, and £>MHV (k > 3) problems. 
In Section [5] we conclude the paper by introducing some future work. In the Appendix, we give 
approximation algorithms for the SoftMHV and HardMHV problems. 

2 Algorithms for MHV 

In Subsection 12. 1[ we give the polynomial time exact algorithm for the 2-MHV problem. In 
Subsection I2.2| we give the approximation algorithms Greedy-MHV and Growth-MHV for 
the MHV problem. 

2.1 2-MHV Is in P 

Let U be a finite set. Recall that a function /: 2 U — > Z + is said to be submodular if f(X) + 
f{Y) > f(X UY) + f{X n Y) holds for all X, Y C U. Given a vertex subset V C V(G), define 
function f(V) to be the number of vertices in V' that has neighbors outside of V, i.e., f(V) 
is the size of the border (see Subsection 1 1 . 2|) of V. It is easy to verify that / is a submodular 
function. 

Consider the 2-MHV problem, in which the color set C contains only two colors 1 and 2. 
This problem can be solved in polynomial time. 

Theorem 2.1 The 2-MHV problem can be solved in 0(mn 7 log n) time. 

Proof: Let V° r9 be the set of vertices that are colored by color 1 by the partial function c, 
and V^ 9 be the analogous vertex subset corresponding to color 2. Then the 2-MHV problem 
is equivalent to finding a cut (Vi, V2) such that V° rg C Vi for i = 1, 2 and /(Vi) + /(V2) is 
minimized. We can do this by merging all vertices in V° rg to a single vertex s, all vertices 
in V^ 9 to a single vertex t, and finding an s-t cut (14, V2) on the resulting graph such that 
f(Vi) + fiYz) is minimized. As pointed out by [191 Lemma 3], finding such a cut can be done 
by an algorithm in [8] for minimizing submodular functions in O(0n 7 log n) time, where 9 is the 
time to compute the submodular function /. When the input graph is stored by a collection 
of adjacency lists, /(•) can be computed in O(m) time in a straightforward way (assuming the 
input graph contains no isolated vertex). The proof of the theorem is finished. 1 

2.2 Approximation Algorithms for MHV 

The approximation algorithms for MHV work based on the types defined for vertices, as shown 
in Definition 12.11 

Definition 2.1 (Types of vertices in MHV) Fix a (partial or total) vertex coloring. Let v 
be a vertex. Then, 

1. v is an H-vertex if v is colored and happy (i.e., \N s (v)\ = deg(v)); 

2. v is a [/-vertex if v is colored and destined to be unhappy (i.e., \N d (v)\ > 0); 
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3. v is a P- vertex if 

(a) v is colored, 

(b) v has not been happy (i.e., \N s (v)\ < deg(v)), and 

(c) v may become happy in the future (i.e., \N d (v) \ = 0); 

4- v is an L- vertex if v has not been colored. 

See Figures [21 El HI for examples of the vertex types. Note that by a type name we also mean 
the set of vertices of that type. Conversely, by a set name we also mean that each element in 
the set is of that type. For example, H is the set of all //-vertices; each vertex in the set H is 
an H -vertex. 

2.2.1 Greedy Approximation Algorithm for MHV 

Algorithm Greedy-MHV. The approximation algorithm Greedy-MHV for MHV is quiet 
simple. We just color all uncolored vertices by the same color. Since there are k colors in C, 
we can obtain k vertex colorings for graph G. Finally we output the coloring that has the most 
number of happy vertices. 

Theorem 2.2 Algorithm Greedy-MHV is a 1/k- approximation algorithm for the MHV prob- 
lem, where k is the number of colors given in the input. 

Proof: Let the partial function c be the vertex coloring used in Definition 12.11 We partition 
L- vertices further into two subsets Lp and Ljj. Lp is the set of uncolored vertices that can 
become happy (i.e., whose neighbors have at most one color). Ljj is the set of uncolored vertices 
that are destined to be unhappy (i.e., whose neighbors already have at least two distinct colors). 
Then (H, P,U, Lp, Ljj) is a partition of V(G). Obviously, in the best case OPT can make all 
vertices in S, P and Lp happy, implying \H\ + |P| + \Lp\ > OPT. 

Let SOLi be the number of happy vertices when Algorithm Greedy-MHV colors all uncol- 
ored vertices by color i. Then we have |i?| + |P| + |Lp| < Yli SOLi. By the greedy strategy, SOL, 
which is the number of happy vertices found by Greedy-MHV, is at least r(|-ff| + \P\ + |Lp|). 
The theorem follows by observing that Greedy-MHV obviously runs in polynomial time. | 

2.2.2 Subset-Growth Approximation Algorithm for MHV 

The subset-growth algorithm starts with the partial vertex coloring (Vi, V2, • • • , Vk) defined by 
the partial function c. From a high level point of view, the algorithm iteratively augments the 
subsets in (Vi, V2, ■ ■ ■ , Vk) by satisfying the vertices that can become happy easily at the current 
time, until (Vi, V2, • • • , Vk) becomes a partition of V(G) and thus a vertex coloring is obtained. 
This strategy is based on the following further classification of L-vertices, according to the type 
of their neighbors. Recall that by Definition 12.11 L-vertex means uncolored vertex. 

Definition 2.2 (Subtypes of L-vertex in MHV) Let v be an L-vertex in a vertex coloring. 
Then, 

1. v is an L p - vertex if v is adjacent to a P -vertex; 

2. v is an L^-vertex if 

(a) v is not adjacent to any P -vertex, 
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(b) v can become happy, that is, v is adjacent to U -vertices with only one color; 
3. v is an L M -vertex if 

(a) v is not adjacent to any P -vertex, 

(b) v is destined to be unhappy, that is, v is adjacent to U -vertices with more than one 
colors; 

4- v is an Lf- vertex if v is not adjacent to any colored vertex. 

See Figures [3l H] for examples of the subtypes of L- vertex. 
The subset-growth algorithm Growth-MHV is as follows. 

Algorithm Growth-MHV 

Input: A connected undirected graph G and a partial coloring function c. 
Output: A total vertex coloring for G. 

1 VI < i < k, Vi <- {v: c(v) = %}. 

2 while there exist L-vertices do 

3 if there exists a P-vertex v then 

4 i 4- c(v). 

5 Add all the L p -neighbors of v to Vi. The types of all affected vertices (including v 

and vertices in N 2 (v)) are changed accordingly. 

6 elseif there exists an L^-vertex v then 

7 Let u be any [/-vertex adjacent to v, then i <— c(u). 

8 Add v and all its L- neighbors to Vi. The types of all affected vertices (including v 

and vertices in N 2 (v)) are changed accordingly. 

9 else 

Comment: There must be an L u -vertex. 

10 Let v be any L u -vertex, u be the any J7-vertex adjacent to v, then i 4— c(u). 

11 Add v to V.. The types of all affected vertices (including v and vertices in N(v)) are 

changed accordingly. 

12 endif 

13 endwhile 

14 return the vertex coloring (V\, V%, • • • , 14). 

When there are still L-vertices (i.e., uncolored vertices), Algorithm Growth-MHV works 
in the following way. It first colors a P-vertex's neighbors to make this P-vertex happy (see 
Figure [2]). When there is no any P- vertex, it colors an L^-vertex and its neighbors to make 
the L/j-vertex happy (see Figure [3]). When there is no any P- vertex or L/j-vertex, it colors an 
L u -vertex by the color of its any U- vertex neighbor (see Figure 2]) . Note that coloring a vertex 
may generate new P-vertices, or L/j-vertices, or L n -vertices. 

When there exist L-vertices, it is impossible that there are only Lj-vertices but no any L p - 
vertex, L^-vertex or L u -vertex, since by assumption G is a connected graph and by definition 
Lj-vertex is not adjacent to any colored vertex. So, when there isn't any L p -vertex or L/j-vertex, 
there must be at least one L n -vertex. As a result, in step [9] we don't need an if statement like 
that in steps [3] and [6l 

We use a type name with the superscript "org" (means "original") to denote the set of 
vertices of that type which is determined by the partial function c, and a type name with the 
superscript "new" to denote the set of vertices of that type which is determined in the execution 
of Algorithm Growth-MHV. For example, H OV9 is the set of H- vertices that are determined by 
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(a) 



(b) 



Figure 2: Process a P-vertex. The hollow vertex v in graph (a) is the P-vertex to be processed. 
The square vertices mean colored vertices, while the round vertices mean uncolored vertices. 




Figure 3: Process an L/j-vertex. The hollow vertex v in graph (a) is the L^-vertex to be 
processed. Note that when an L^-vertex is to be processed, there is no P-vertex in the current 
graph (a). 




Figure 4: Process an L u -vertex. The hollow vertex v in graph (a) is the L u -vertex to be 
processed. Note that when an L n -vertex is to be processed, there is no any P-vertex or L\ r 
vertex in the current graph (a). 
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the partial function c, and H new is the set of //-vertices that are newly generated by Algorithm 
Growth-MHV. 

Let A be the maximum degree of vertices in the input graph. We first bound the number 
of L™ ew - vertices. 

Lemma 2.1 \L™ ew \ < A(A — 2)\H new \. 

Proof: Algorithm Growth-MHV iteratively processes three types of vertices, that is, the P- 
vertices, the L/i-vertices and the L u -vertices. We will prove the lemma by proving the following 
three points: (1) When Algorithm Growth-MHV processes a P-vertex, at most A(A — 2) 
/""-vertices are generated, (2) When Algorithm Growth-MHV processes an L^-vertex, at 
most (A — 1)(A — 2) /"^-vertices are generated, and (3) When Algorithm Growth-MHV 
processes an L n -vertex, no L" eu, -vertex is generated. 

Consider the first point. Let v be a P-vertex to be processed. Suppose v has an L p -neighbor 
w, which is adjacent to a [/-vertex. Only if there is an L/j-vertex x which is the neighbor of 
w, x will become a newly generated L u -vertex when the P- vertex v is processed. See Figure 
[2] for an example. Since the maximum vertex degree is A, v has at most A L p -neighbors, and 
w has at most A — 2 L^-neighbors. This implies that when v is processed, at most A(A — 2) 
/^"'-vertices can be generated. 

Then consider the second point. Suppose the L^-vertex to be processed is v. Suppose v 
has an L-neighbor w (w can be an L/j-vertex or an L u -vertex), which is adjacent to a [/-vertex. 
Similarly, only if there is an L/j-vertex x which is the neighbor of w, x will become a newly 
generated L n -vertex when the L^-vertex v is processed. See Figure [3] for an example. Since 
the maximum vertex degree is A, v has at most A — 1 L-neighbors, and w has at most A — 2 
L/j-neighbors. This implies that when v is processed, at most (A — 1)(A — 2) L™ ew -vertices can 
be generated. 

Finally consider the third point. When Algorithm Growth-MHV processes an L n -vertex, 
there is no any L^-vertex (or P-vertex) in the current graph. So, adding an L u -vertex to some 
subset Vi does not generate any new L u - vertex. See Figure H] for an example. 

When Algorithm Growth-MHV processes a P-vertex or an L^-vertex, at least one vertex 
becomes an //-vertex. So we can charge the number of newly generated L u -vertices to this 
newly generated //-vertex. This finishes the proof of the lemma. | 

The following Lemma 12.21 gives an upper bound on OPT, the number of happy vertices in 
an optimal solution to the fc-MHV problem. 

Lemma 2.2 OPT < \H or 9\ + (A + l)(|L or 9| - \L°u 9 \). 

Proof: By the partial function c, all vertices in the original graph (i.e., the input graph that 
has not been colored by Algorithm Growth-MHV) are partitioned into four vertex subsets 
H OT 9, P or 9, U or9 and L ors . Subset L or 9 is further partitioned into four subsets L° r9 , L° rs , LZ 9 
and L°f 9 . By definition, all vertices in U org are unhappy. And, all vertices in L°u 9 are destined 
to be unhappy since each of them is adjacent to at least two vertices with different colors. So, 
in the best case all vertices in P or9 and L° r9 except those in Lu 9 would be happy. Noticing 
that the vertices in H OT9 are already happy, we have 

OPT < \H or9 \ + \P or9 \ + \L or9 \ - \L° u r9 \. 

Since each P-vertex must be adjacent to some L p -vertex, and each L p -vertex can be adjacent 
to at most A P-vertices, the number of P or5l -vertices is at most A\Lp r9 \. Since \Lp V9 \ < \L OT ' 9 \ — 
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\Lu^ 9 \, we get that 

OPT < \H or9 \ + A\L° r9 \ + \L or9 \ - \L° r9 \ 
< \H or9 \ + {A + l){\L or9 \-\L°^ 9 \), 

concluding the lemma. | 

Lemma 2.3 \H new \ > ^-^(\L or9 \ - \l£ a \). 

Proof: Recall that there are four subtypes of an L- vertex, i.e., L p -vertex, L^- vertex, L u - 
vertex and Lj-vertex. Among them only L p -vertex and L^-vertex will (directly) contribute to 
generating //-vertices. For an Lj-vertex, it will ultimately become one of the other three types 
of L-vertex. For each L u -vertex, although it may become an L p -vertex and hence can contribute 
to generating //-vertices, in the worst case we may assume that it is added to some subset Vj 
and contribute nothing to the generation of //-vertex. 

By step [3] and step each time an //-vertex is generated, at most A L p - vertices or L^- 
vertices are consumed (i.e., colored). Furthermore, once an L- vertex is colored, it will never be 
re-colored or de-colored. So we have 

\H new \ > ^(\L or9 \ - \L°J 9 \ - \L™\). 

By Lemma |2. 11 we have 

^■(|L or9 | - \L°J 9 \ - \L™ W \) > ^{\L or9 \ - \L° u r9 \ - A(A - 2)\H new \) 

= ^{\L org \-\L° u r9 \)-(A-2)\H new \. 

Therefore, (A - l)\H new \ > ^(\L or9 \ - \L°u 9 \). The lemma follows. B 

Theorem 2.3 The MHV problem can be approximated within a factor o/f2(A~ 3 ) in polynomial 
time. 

Proof: Algorithm Growth-MHV obviously runs in polynomial time. Let SOL be the number 
of happy vertices found by Algorithm Growth-MHV. Then we have 

SOL = \H or9 \ + \H new \ 

- |ff ° r3| + a(a*- i) ( |L ° rg| ~ |Lri ) ( B y Lemma E3) 

> a / a !u, , J \H or9 \ + (A + l){\L^ 9 \-\L°: 9 \ 



A(A-1)(A + 1) 
n(A~ 3 )OPT. 



> -r-r tt-t —OPT (By Lemma EZ 

~ A(A-1)(A + 1) v 



The theorem follows. 
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3 Algorithms for MHE 



3.1 2-MHE Is in P 

For 2-MHE, the partial function c can only use two colors, to say, color 1 and color 2. Given such 
an instance, merge all vertices with color 1 assigned by c into a single vertex s, and all vertices 
with color 2 into a single vertex t. (The edges whose two endpoints are merged disappear in 
the procedure.) Then compute a minimum s-t cut (Vi, V2) on the resulting instance. Suppose 
s £ V± and t £ Vi- Assign color 1 to all vertices (including the merged vertices) in V%, and 
color 2 to all vertices in V%. Since (Vi, V2) is a minimum s-t cut, the number of happy edges in 
the resulting vertex coloring is maximized. By the work of [6], a maximum flow (and hence a 
minimum s-t) in a unit capacity network can be computed in 0(min{n 2 / 3 m, to 3//2 }) time. So 
we have 

Theorem 3.1 The 2-MHE problem can be solved in 0(min{n 2//3 m, m 3//2 }) time. | 

3.2 Approximation Algorithm for MHE 

The MHE problem admits a simple division-strategy based algorithm which yields a 1/2- 
approximation. The algorithm is designed to deal with more general graphs with nonnegative 
weights {w(e)} defined on edges. We thus denote by w(E') the total weight of edges in an edge 
subset E' . 

Algorithm Division-MHE 

Input: An undirected graph G and a partial coloring function c. 
Output: A total vertex coloring for G. 

1 G\ <- G. 

2 Let E' be the set of edges in Gi that has exactly one endpoint not colored by function c. 

Define graph G' = (V(G\), E'), which is a subgraph of G\. 

3 For each star S in G' centered at an uncolored vertex v, color v by a color in {c(u) | u £ 

N(v),u is colored} such that the total weight of happy edges in S is maximized. 

4 Color all vertices in G\ still having not been colored by just one arbitrary color. Denote 

by SOL\ the vertex coloring of G\. 

5 G 2 <s— G. 

6 Color all uncolored vertices in G2 by just one arbitrary color. Denote by SOL2 the vertex 

coloring of G 2 - 

7 return the better one among SOL\ and SOL2. 

Algorithm Division-MHE computes two independent solutions SOL\ and SOL2 to graph 
G, and then outputs the better one, where the better one means the solution making more 
edges happy. For an illustration of graph G' and its stars in step El please refer to Figure [5j 

Theorem 3.2 Algorithm DIVISION-MHE is a 1/2- approximation algorithm for the MHE prob- 
lem. 

Proof: First, the algorithm obviously runs in polynomial time. 

Let W° r9 be the total weight of edges already being happy by the partial coloring function 
c. This weight can be trivially obtained by any solution. 

Let W be the total weight of happy edges found by Algorithm Division-MHE on graph 
G'. Note that W is the maximum total weight that can be obtained from graph G' . Let E" be 
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S l S 2 S 4 S 5 



Figure 5: An example of graph G' . Each edge in G' has its one endpoint colored and the other 
endpoint uncolored. The square vertices mean colored vertices, while the round vertices mean 
uncolored vertices. Each star (marked with dashed circle) is centered at an uncolored vertex. 
Two stars (e.g., S\ and S2) may share common colored vertices. 

the set of edges that has both of its two endpoints uncolored by function c, and W" = w(E") 
be its total weight. Then we have OPT < W org + W + W" . 

By the algorithm, we know SOL x > W or 9 + W and SOL 2 > W OT 9 + W" . Then the 
approximation ratio 1/2 of Division-MHE is obvious since SOL = max{SOLi, SOL2} > 
i(W org + W + W"). 1 

4 Hardness Results 
4.1 NP-hardness of MHE 

The NP-hardness of the 3-MHE problem is proved by a reduction from the Multiway Cut 
problem [3J. 

Theorem 4.1 The 3-MHE problem is NP-hard. 

Proof: Given an undirected graph G = (V,E) and a terminal set D = {si, S2, S3}, the 3- 
Terminal Cut problem (i.e., the Multiway Cut problem with 3 terminals), which is NP-hard 
[3], asks for a minimum cardinality edge set such that its removal from G disconnects the 
three terminals from one another. Given an instance (G, D) of 3- Terminal Cut, we construct 
the instance (H, C, c) of 3-MHE as follows. Graph H is just G. Color set C is set to be 
{1,2,3}. The partial function c assigns colors 1, 2, 3 to vertices si,S2,ss, respectively. Let c* 
be the cardinality of an optimal 3- way cut for (G,D), and m* be the number of happy edges 
of an optimal vertex coloring for (H, C, c). Then one can easily find that m* = m — c* , where 
m = \E(G)\ (= |^(-ff)|). This shows the 3-MHE problem is NP-hard. a 

Corollary 4.1 The MHE problem is NP-hard. 

Proof: In the input of MHE, just set k to be 3. | 

Theorem 4.2 The k-MHE problem is NP-hard for any constant k > 3. 

Proof: By Theorem 14.11 we need only focus on k > 3. Let k be such a constant. 

Given a 3-MHE instance (G, c), we construct a fc-MHE instance (G',c') as follows. Build 
2(k — 3) vertices X4, 7/4, £5, 7/5, • • • , Xk,yk and k — 3 edges (xi,yi), 4 < i < k. Vertices Xi and yi 
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are colored by color i, for 4 < i < k. Let v be a vertex in G whose color given by c is 1. Then 
put k — 3 edges (v, Xi), 4 < i < k. This is our new instance (G', c'). 

Obviously for 4 < i < k, each edge (xi,yi) is happy whereas each edge (v,Xi) is unhappy. 
So, the optimum of (G, c) is just equal to the optimum of (G", c') minus k — 3, concluding the 
theorem, g 

4.2 NP-hardness of MHV 

Theorem 4.3 T/ie k-MHV problem is NP -hard for any constant k>3. 

Proof: By Theorem O fc-MHE is NP-hard (fc > 3). We thus reduce fc-MHE to k-MRV. 

Let (G,c) be a fe-MHE instance. The instance (G',c') of A;-MHV is constructed as follows. 
Add k vertices x\,x%, ■ ■ ■ , x& and put an edge between Xj and v, for each 1 < i < k and each 
v G V(G). Vertex xi is colored by i, for 1 < i < A;. For every edge (u, f) € E(G), add a vertex 
y Mt , and replace the edge by two edges (u,y uv ) and (y U v,v)- This is our new instance (G',c'). 

Since in graph G there are vertices with pre-specified colors, each xi (1 < i < k) cannot 
become happy no matter how the remaining vertices are colored. Every original vertex v € V(G) 
also cannot become happy since it is adjacent to all Xj's. Let (u, v) be any edge in G. Since 
the degree of vertex y uv is 2, it is happy iff its two neighbors have the same color. This shows 
that the optimum of the fc-MHE instance (G, c) is equal to the optimum of the fc-MHV instance 
(G',c'). The theorem follows. | 

5 Conclusions 

The MHV problem and the MHE problem are two natural graph coloring problems arising 
in the homophyly phenomenon of networks. In this paper we prove the NP-hardness of the 
MHV problem and the MHE problem, and give several approximation algorithms for these two 
problems. 

Since our algorithms Greedy-MHV, Growth-MHV and Division-MHE actually do not 
care whether the color number k is given in the input or whether k is a constant, the k-MHV and 
&-MHE problems can also be approximated within max{l/A;, fi(A -3 )} and 1/2, respectively. 

To improve the approximation ratios for MHV and MHE remains an immediate open prob- 
lem. It is also interesting to study the MHV and MHE problems in random graphs generated 
from the classical network models, and in the real-world large networks. 
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Appendix 



A Variants of MHV 

For a vertex v in the MHV problem, instead of requiring that all neighbors of v have the same 
color as that of v, to make v happy we may only require at least p ■ deg(v ) neighbors have the 
same color as that of v, or only require at least q neighbors have the color identical to that of 
v, for some global number q. This leads to two natural variants of the MHV problem, that is, 
the SoftMHV problem and the HardMHV problem. Similarly, we can define the corresponding 
varints for the A;-MHV problem, and our results in this section naturally extends to these 
variants. For simplicity, we only consider approximation algorithms for the SoftMHV and 
HardMHV problems. 

Fix a vertex coloring, and let v be a (colored or uncolored) vertex. Define Ni(v) to be the 
set of vertices in N(v ) which has color i, for 1 < i < k. 

B MHV with Soft Threshold 

Let p be a number in (0, 1). In the soft-threshold extension of the MHV problem (SoftMHV for 
short), a vertex v is happy if v is colored and \N s (v)\ > p-deg(v). Given a connected undirected 
graph G, a partial coloring function c, the SoftMHV problem asks for a total vertex coloring 
extended from c that maximizes the number of happy vertices. (The number p can be given 
as a part of the input or be a constant. We do not distinguish between these two cases for 
simplicy.) 

B.l Algorithm for SoftMHV 

As what is done in Definition 12. 1| we define the types of vertices according to the given vertex 
coloring. 

Definition B.l (Types of vertex in SoftMHV) Fix a (partial or total) vertex coloring. 
Let v be a vertex. Then, 

1. v is an H -vertex if v is colored and happy; 

2. v is a U -vertex if 

(a) v is colored, and 

(b) v is destined to be unhappy, (i.e., deg{v) — |iV rf (t>)| < p ■ deg(v)); 

3. v is a P-vertex if 

(a) v is colored, 

(b) v has not been happy (i.e., \N s (v)\ < p ■ deg(v)), and 

(c) v can become an H-vertex (i.e., \N s (v)\ + |iV M (i;)| > p ■ deg(v)); 

4- v is an L-vertex if v has not been colored. 

We note that Algorithm Greedy-MHV is also a l//c-approximation algorithm for the Soft- 
MHV problem. To see this, we just define Lp in Theorem 12.21 as the set of uncolored vertices 
v such that \N u (v)\ + max{ \Ni(v)\} > p ■ deg(t;), and Lb = L - Lp. 
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Theorem B.l The SoftMHV problem can be approximated within a factor ofl/k in polynomial 
time. | 

Below we give the subset-growth approximation algorithm Growth-SoftMHV for the 
SoftMHV problem. First we define the subtypes of L-vertex. 

Definition B.2 (Subtypes of L-vertex in SoftMHV) Let vertex v be an L-vertex in a ver- 
tex coloring. Then, 

1. v is an L p -vertex if v is adjacent to a P -vertex, 

2. v is an L\ t -vertex if 

(a) v is not adjacent to any P -vertex, 

(b) v is adjacent to an H -vertex or a U -vertex, and 

(c) v can become happy (that is, \N u {y)\ + max{|iVj(t>)| : 1 < i < k} > p ■ deg(v) ), 

3. v is an L u -vertex if 

(a) v is not adjacent to any P -vertex, 

(b) v is adjacent to an H -vertex or a U -vertex, and 

(c) v is destined to be unhappy (that is, \N u (v)\ +max{|iVj(f )| : 1 < i < k} < p-deg(v)), 
4- v is an Lj-vertex if v is not adjacent to any colored vertex. 

Algorithm Growth-SoftMHV 

Input: A connected undirected graph G and a partial coloring function c. 
Output: A total vertex coloring for G. 

1 VI < i < k, Vi <r- {v. c(v) = i}. 

2 while there exist L-vertices do 

3 if there exists a P-vertex v then 

4 i ^— c(v). 

5 Add its any \p ■ deg(u)] — \N s (v) n Vi\ L p -neighbors to vertex subset VI. The types 

of all affected vertices (including v and vertices in N 2 (v)) are changed accordingly. 

6 elseif there exists an L/j-vertex v then 

7 Let Vi be the vertex subset in which v has the maximum colored neighbors. 

8 Add vertex v and its any \p ■ deg(v)] — \N s (v) n V\\ L-neighbors to vertex subset V{. 

The types of all affected vertices (including v and vertices in N 2 (v)) are changed 
accordingly. 

9 else 

Comment: There must be an L u -vertex. 

10 Let v be any L u -vertex, and Vi be any vertex subset in which v has colored neighbors. 

1 1 Add vertex v to subset Vi . The types of all affected vertices (including v and vertices 

in N(v)) are changed accordingly. 

12 endif 

13 endwhile 

14 return the vertex coloring (Vi, V2, ■ ■ ■ ,V^). 

In step the algorithm adds the least number (that is, \p ■ deg(t>)] — \N s (v) PI V£|) of v's 
neighbors to subset Vi to make v happy. The same thing is done in step El 
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Lemma B.l |L" eu, | < 0(A 2 )\H new \. 

Proof: Suppose Algorithm Growth-SoftMHV is to process a P-vertex v, which is already 
colored by color i. When v is processed, at most |~pA] L p -neighbors of v are added to Vi. Each 
of the Lp-neighbors has at most A — 1 L/j-neighbors. In the worst case, all these L/j-neighbors, 
plus the remaining L p -neighbors of v, could become L u -vertices when v is processed. So, at 
most [/3A](A — 1) + (1 — a) A = 0(A 2 ) L™ e ™-vertices can be generated in this case. 

Then suppose the algorithm is to process an L^-vertex v. Let Vi be the vertex subset in which 
v has the maximum colored neighbors. When v is processed, at most \pA~] — 1 L-neighbors of v 
are added to Vi. Each of these L-neighbors can have at most A — 1 L^-neighbors. In the worst 
case, all these L/j-neighbors, plus the remaining L-neighbors of i>, could become L n -vertices 
when v is processed. So, at most ([/oA] — 1) (A — 1) + (1 — a) A = 0(A 2 ) L" e,u -vertices can be 
generated in this case. 

When the algorithm processes an L n -vertex, there are only L u -vertices or Lj-vertices (if 
any) in the current graph. So, coloring an L n -vertex does not generate any new L u -vertex. 

By charging the number of newly generated L u -vertices to the newly generated iLvertex, 
we finish the proof of the lemma. | 

Theorem B.2 The SoftMHV problem can be approximated within a factor of f2(A -3 ) in poly- 
nomial time. 

Proof: Each time an H- vertex is generated, at most [~/?A] L- vertices are consumed (i.e., 
colored). So, for the number of newly generated iL-vertices we have |ij new | > (|L or9 | — \L^ 9 \ — 
\L™ w \)/\pA]. By Lemma lB~Tl we get 

\jorg\ _ I t org i 



0(A 2 ) ' 

Let OPT be the number of happy vertices in an optimal solution to the problem. By the 
same reason as in Lemma [2J2I we obtain 



OPT < \H or9 \ + \P or9 \ + \L or9 \-\L° u r9 \ 

< \H OT9 \ + A|L° r9 | + \L org \ - \L° r9 \ 

< \H or9 \ + (A + l)(\L or9 \-\L° u r9 \). 

Let SOL be the number of happy vertices found by Algorithm Growth-SoftMHV. Then 
we have 



new i 



SOL = \H or9 \ + \R 

- \ H ° r9 \ + o{^)(\ LOr9 \-\ L - 9 \ 

> tJt^t ( I H org I + A ( I L or ' 9 1 - I LZ 9 1 



0(A 3 ) 
= n(A- 3 )OPT. 

Finally, notice that Algorithm Growth-SoftMHV obviously runs in polynomial time. 
This gives the theorem, g 
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B.2 NP-Hardness of SoftMHV 



Theorem B.3 For any real number < p < 1, there exist infinitely many integers k > 3, such 
that the corresponding SoftMHV problem is NP-hard. 

Proof: Reduce from 3-MHE. Let (G, c) be an instance of 3-MHE, and p be any real constant 
in (0,1). We shall construct a SoftMHV instance (G',c') in the following, in which the color 
number k > 3 is an integer that depends only on p. The value of k will be given later. 

Let h be an integer constant depending on p and k, which will be fixed later. For every edge 
(u, v) € E(G), add h + k + 1 vertices x uv (called x-vertex), yf" , y% v , ■ ■ ■ , yjf (called y-vertices), 
z™, z™, '"') z q V (called z- vertices) . Replace edge (u,v) by two consecutive edges (u,x uv ) and 
(x uv ,v). For each vertex a G {y™, • • • , y^ \ zf v , ■ ■ ■ , zj™}, connect it to via an edge (a, x u „). 
For 1 < i < k, vertex has a pre-specified color i. 

For every vertex v € V(G), add A • k vertices w\ l , w\ 2 i ' ' ' > w i a> w 2i' ^2 2; ' ' ' > w 2 A> 
• • • , w'j, v w^. 2 , ■ ■ ■ , w%. A (called ^-vertices), where A is the maximum vertex degree of G. For 
each 1 < i < q and each 1 < j < A, connect vertex w\j to v via an edge u). Vertex it?^- is 
colored in advance by color i, 1 < i < fc, 1 < j < A. This is our graph G' in the new instance 
of SoftMHV. 

Next we determine constants h and k. To enable the reduction to work, h and k should 
satisfy 

h + 3 > p(h + k + 2), (1) 
h + 2 < p(h + k + 2). (2) 

Let (u,v) be any edge in G. Consider vertex x uv in G' . No matter how x uv is colored (recall 
that the color set is {1, 2, • • • , k}), there is exactly one vertex in {zf v ', ■ ■ ■ , z™} having the same 
color as that of x uv . Note that deg G i(x uv ) = h + k + 2. So, inequality ([1]) guarantees that if 
all vertices in {u, v,y\ v , ■ ■ ■ have the same color as that of x uv , x uv will be happy, and, 

inequality ([2]) guarantees that if there is one vertex in {u, v, yf v , • • • , y% v } having different color 
to that of x uv , x uv will be unhappy. 

By inequality ([1]) and inequality ([2]), the value of integer h should satisfy 



h G 



pk + 2p-3 pk + 2p-2 
1-P ' 1-P 



(3) 



Since pk + 2p — 2 = {pk + 2p — 3) + 1 and 1 — p < 1, there must be at least one integer in the 
interval of 

Of course, the left end of the interval of ([3]) should be at least 1. This gives 

k > - - 3. (4) 
P 

For each vertex v € V(G') that comes from G, we want to guarantee that no matter how 
the vertices in G' are colored, v will never be happy. Note that no matter what color vertex v 
is colored by, there are exactly A vertices in {w\j : 1 < i < k, 1 < j < A} having the same color 
as that of v. Since deg G (i>) < A and deg G ,(v) > kA + 1, to make vertex v unsatisfiable, we just 
need 2A/(kA + 1) < p. Since 2A/(kA + 1) < 2A/(kA), this will be guaranteed by letting 

k >-. (5) 
P 

Since we start our reduction from the 3-MHE problem, naturally we need 

k > 3. (6) 
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By inequalities ([ID , (JSJ and (0) , we can set k as any integer such that 

r4 2 I 

k > max< 3, — , 3 >. 

Lp p J 

Once /c is fixed, we can fix h according to Q. 

We have completed our new instance (G',d) of SoftMHV. 

Let m = \E(G)\ and n = |V(G)|. Denote by m* the number of happy edges in an optimal 
solution to the 3-MHE instance (G,c), and by n* the number of happy vertices in an optimal 
solution to the SoftMHV instance (G',d). We shall prove the following claim, which will finish 
the proof of the theorem. 



Claim 1 m* > ttiq <4=>- n* > An + (h + l)m + mo- 



Proof: (=$■) Let c* be an optimal solution to instance (G, c). First we color every vertex 
v € V(G') such that v is also in G by color c*(v). For each edge (u, v) £ E(G), color x uv by 
color c*(u), (actually coloring x uv by either c*(u) or c*(u) is ok.) and color all vertices yf", • • • , 
y^ v by the color of x uv . This is our vertex coloring for instance (G',d). 

By similar arguments before inequality ([5]), for every vertex v £ V(G) fl V(G') and its 
corresponding to- vertices in G", f itself is unhappy and there are exactly A happy vertices in 
{w^j : 1 < i < A;, 1 < j < A}. So we obtain An happy vertices from all u>-vertices in G'. 

Let (u,v) be an edge in G. In its corresponding y-vertices {y™,-- - ,?/™} and z-vertices 
{zf v , • • • , z™}, there are exactly h + 1 vertices that are happy by our coloring. So we obtain 
(/i + l)m happy vertices from all the ?/-vertices and z-vertices in G' . 

Next let us consider vertex x uv . If (it, v) is happy by c*, then x uv has fo + 3 neighbors having 
the same color as that of x uv . So the fraction of happy neighbors of x uv is 

pfc+2p-3 , o 

h + 3 ^ i-p +-3 



> — = o 



h + k + 2 ~ pk+2 P -3 + k + 2 

where the first inequality is due to h > pk ~[^_ p ~ 3 (by inequality ([1])), and hence x uv is happy. 
Since m* > niQ, we can obtain > mo happy vertices from all x- vertices in G' . 

Summing all, the number of happy vertices in G' by our coloring is at least An+(h+l)m+mo. 

(<=) Let d* be an optimal solution to instance (G',c r ) of SoftMHV. By the arguments 
before inequality ((SJ), every vertex v £ V(G') fl V(G) is unhappy by c'*, and there are exactly 
An happy w- vertices by d*. 

Let (u, v) be any edge in G. Since d* is an optimal coloring, we can assume that all vertices 
yf v , • • • , y™ have color d*(x uv ). Taking into account the one more happy vertex z™ (where 
% = d*(x uv )) for each (u,v) G E{G), there are exactly (h + l)m happy vertices by d* from all 
y- vertices and z- vertices. 

Now only x- vertices in G' remain unconsidered. Since n* > An + (h + l)m + mo, there must 
be at least mo happy x-vertices. Let x uv be any such vertex. Since x uv is happy, the number of 
neighbors of x uv that have the color as that of x uv is at least 

p(h + k + 2)> P (h + h - ph - 2p + 2 +2)=h + 2, 

where the first inequality is due to inequality ([2]). This shows that the number of neighbors of 
x uv having color d*{x uv ) is at least h + 3. So, vertices u and v must have the same color (as 
that of x uv ). 
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Let us color every vertex v € V(G) by color c'*(v). If there are vertices in G colored by 
colors in {4, 5, • • • , k}, then color all of them by color 1 (note that G is part of the instance of 
the 3-MHE problem). This will never decrease the number of happy edges in G. By the above 
analysis, the number of happy edges in G is at least tjiq. | 

The proof of the theorem is finished. | 

C MHV with Hard Threshold 

In the hard-threshold variant of the k-MHV problem (HardMHV for short), a vertex v is happy 
if |iV s (f)| > q, where q is an input parameter. Given a connected undirected graph G, a partial 
coloring function c, and an integer q > 0, the HardMHV problem asks for a total vertex coloring 
extended from c that maximizes the number of happy vertices. It is reasonable to assume q < A, 
since otherwise there is no feasible solution to the problem. 

C.l Algorithm for HardMHV 

The following type definition of vertices is similar to Definition IB. 11 

Definition C.l (Types of vertex in HardMHV) Fix a (partialor total) vertex coloring. 
Let v be a vertex. Then, 



v is 


an H -vertex if v is colored and happy, 




v is 


a U -vertex if 




(a) 


v is colored, and 




(b) 


v is destined to be unhappy (i.e., deg(-u) 


-\N d (v)\<q), 


v is 


a P-vertex if 




(a) 


v is colored, 




(b) 


v has not been happy (that is, \N s (v)\ < 


q), and 


(c) 


v can become happy (i.e., \N s (v)\ + \N U { 


[v)\>q), 


v is 


an L-vertex if v has not been colored. 





Similar as the case of SoftMHV, Algorithm Greed y-MHV is also a 1/ /c-approximation 
algorithm for the HardMHV problem. To prove this we only need to define Lp in Theorem 12.21 
as the set of uncolored vertices v such that \N u (v)\ + max{ |./Vj(t>)|} > q, and Lp, = L — Lp. 

Theorem C.l There is a 1 / k- approximation algorithm for the HardMHV problem. | 

In the MHV and SoftMHV problems, for an L-vertex v, if \N d (v)\ is too large, then v may 
be destined to be unhappy. In contrast, in the HardMHV problem, an L-vertex v may be 
destined to be unhappy even if \N d (v)\ = 0: This will happen when deg(w) < q. Based on this 
observation, the L-vertex type is divided into the following four subtypes. 

Definition C.2 (Subtypes of L-vertex in HardMHV) Let vertex v be an L-vertex in a 
vertex coloring. Then, 

1. v is an L p -vertex if v is adjacent to a P-vertex, 
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2. v is an L^-vertex if 

(a) v is not adjacent to any P -vertex, 

(b) v is adjacent to an H -vertex or a U -vertex, and 

(c) v can become happy (i.e., \N u [v)\ + max{ |iVj(u)| : 1 < i < k} > q), 

3. v is an L u -vertex if 

(a) v is not adjacent to any P -vertex, and 

(b) v is destined to be unhappy (i.e., \N u {v)\ + max{|A^(?;)| : 1 < i < k} < q), 
4- v is an Lf -vertex if 

(a) v is not adjacent to any colored vertex, and 

(b) v can become happy. 

One can verify that the subtypes in Definition IC.2I really form a partition of all L- vertices. 
Note that the L^-vertex not only refers to the destined-to-be-unhappy L- vertex that is adjacent 
to an H- vertex or a [/-vertex (like the L M -vertex in MHV and the L u - vertex in SoftMHV), but 
also refers to the destined-to-be-unhappy L-vertex that is not adjacent to any colored vertex, 
as discussed before Definition IC.2I 

Below is the subset-growth approximation algorithm Growth-HardMHV for the HardMHV 
problem. 

Algorithm Growth-HardMHV 

Input: A connected undirected graph G, a partial coloring function c, and an integer q > 0. 
Output: A total vertex coloring for G. 

1 VI < i < k, Vi {v. c{v) = i}. 

2 while there exist L-vertices do 

3 if there exists a P-vertex v then 

4 i <- c(v). 

5 Add its any q — \N s {v) n Vi\ L p - neighbors to Vi. The types of all affected vertices 

(including v and vertices in N 2 (v)) are changed accordingly. 

6 elseif there exists an L^-vertex v then 

7 Let Vi be the vertex subset in which v has the maximum colored neighbors. 

8 Add vertex v and its any q — \ N s (v ) PI Vi \ L-neighbors to Vi. The types of all affected 

vertices (including v and vertices in N 2 (v)) are changed accordingly. 

9 else 

Comment: There must be an L u -vertex. 

10 Let v be any L n -vertex. If v has colored neighbors, then let Vi be any vertex subset 

containing a colored neighbor of v. Otherwise let Vi be Vi. 

1 1 Add vertex v to subset Vi . The types of all affected vertices (including v and vertices 

in N(v)) are changed accordingly. 

12 endif 

13 endwhile 

14 return the vertex coloring (Vi, V2, • • • , Vj-). 
Lemma C.l \L™ ew \ < 0{A 2 )\H new \. 
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Proof: The proof of the lemma is similar to that of Lemma IB.ll Only one point needs to 
pay attention. When the algorithm processes an L u - vertex, there are only L^-vertices or Lf- 
vertices (if any) in the current graph. Each time Algorithm Growth-HardMHV processes an 
L u -vertex v, it processes only one such vertex. So, if v has an L^-neighbor u, u will become an 
L^-vertex after the processing. This means that coloring an L u -vertex does not generate any 
new L u -vertex. We omit the other details of the proof. | 

Theorem C.2 The HardMHV problem can be approximated within a factor o/J)(A -3 ) in poly- 
nomial time. 

Proof: Each time an H -vertex is generated, at most q L-vertices are consumed (i.e., colored). 
So, for the number of newly generated H-veitices we have \H new \ > (|L or9 | — |L« r5 | — \L™ ew \)/q. 
By Lemma IC. 11 and noticing that q < A, we get 

I T org I _ I j org i 
I rrnew I \ \^ * I' 

{H - om ■ 

Let OPT be the number of happy vertices in an optimal solution to the problem. By the 
same reason as in Lemma 12.21 we obtain 

OPT < \H or9 \ + (A + l)(|L or9 | - \L° u r9 \). 

Let SOL be the number of happy vertices found by Algorithm Growth-HardMHV. Then 
we have SOL = \H or 9\ + \H new \ = n(A- 3 )OPT by the above two inequalities. As Algorithm 
Growth-HardMHV obviously runs in polynomial time, the theorem follows. | 



C.2 NP-Hardness of HardMHV 

Theorem C.3 The HardMHV problem is NP-hard for any constant k > 3, where k is the color 
number in the problem. 

Proof: We prove the theorem by reducing fc-MHE (see Theorem 14. 2p to HardMHV. 

Given an instance (G, c) of fc-MHE, we construct an instance (G',c',q) of HardMHV as 
follows. For each edge (u,v) £ E(G), do the following. Add a vertex x uv and A — 1 vertices 
yuv ^ yuv ^ . . . ^ ywv ^ w h e r e A i s maximum vertex degree of G. The vertices y™'s are called 
satellite vertices. Replace edge (u, v) by two edges (u, x uv ) and (x uv ,v). Connect each vertex 
yf v to x uv via an edge {x uv ,y^ v ). Finally, let q = A + 1. We thus get our HardMHV instance 
(G',c',q). 

Since q = A + 1, each original vertex v £ V{G) and each newly added satellite vertex cannot 
be happy no matter how the vertices in G' are colored. For each edge (u, v) € E(G), since its 
corresponding vertex x uv is of degree A + 1, x uv is happy iff its two neighbors u and v have the 
same color. This shows that the optimum of (G,c) is equal to that of (G',c',q), finishing the 
proof of the theorem. | 
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